14 research outputs found

    Adaptive scheduling for adaptive sampling in pos taggers construction

    Get PDF
    We introduce an adaptive scheduling for adaptive sampling as a novel way of machine learning in the construction of part-of-speech taggers. The goal is to speed up the training on large data sets, without significant loss of performance with regard to an optimal configuration. In contrast to previous methods using a random, fixed or regularly rising spacing between the instances, ours analyzes the shape of the learning curve geometrically in conjunction with a functional model to increase or decrease it at any time. The algorithm proves to be formally correct regarding our working hypotheses. Namely, given a case, the following one is the nearest ensuring a net gain of learning ability from the former, it being possible to modulate the level of requirement for this condition. We also improve the robustness of sampling by paying greater attention to those regions of the training data base subject to a temporary inflation in performance, thus preventing the learning from stopping prematurely. The proposal has been evaluated on the basis of its reliability to identify the convergence of models, corroborating our expectations. While a concrete halting condition is used for testing, users can choose any condition whatsoever to suit their own specific needs.Agencia Estatal de Investigación | Ref. TIN2017-85160-C2-1-RAgencia Estatal de Investigación | Ref. TIN2017-85160-C2-2-RXunta de Galicia | Ref. ED431C 2018/50Xunta de Galicia | Ref. ED431D 2017/1

    Modeling of learning curves with applications to POS tagging

    Get PDF
    An algorithm to estimate the evolution of learning curves on the whole of a training data base, based on the results obtained from a portion and using a functional strategy, is introduced. We approximate iteratively the sought value at the desired time, independently of the learning technique used and once a point in the process, called prediction level, has been passed. The proposal proves to be formally correct with respect to our working hypotheses and includes a reliable proximity condition. This allows the user to fix a convergence threshold with respect to the accuracy finally achievable, which extends the concept of stopping criterion and seems to be effective even in the presence of distorting observations. Our aim is to evaluate the training effort, supporting decision making in order to reduce the need for both human and computational resources during the learning process. The proposal is of interest in at least three operational procedures. The first is the anticipation of accuracy gain, with the purpose of measuring how much work is needed to achieve a certain degree of performance. The second relates the comparison of efficiency between systems at training time, with the objective of completing this task only for the one that best suits our requirements. The prediction of accuracy is also a valuable item of information for customizing systems, since we can estimate in advance the impact of settings on both the performance and the development costs. Using the generation of part-of-speech taggers as an example application, the experimental results are consistent with our expectations.Ministerio de Economía y Competitividad | Ref. FFI2014-51978-C2-1-

    Undirected dependency parsing

    Get PDF
    Dependency parsers, which are widely used in natural language processing tasks, employ a representation of syntax in which the structure of sentences is expressed in the form of directed links (dependencies) between their words. In this article, we introduce a new approach to transition-based dependency parsing in which the parsing algorithm does not directly construct dependencies, but rather undirected links, which are then assigned a direction in a postprocessing step. We show that this alleviates error propagation, because undirected parsers do not need to observe the single-head constraint, resulting in better accuracy. Undirected parsers can be obtained by transforming existing directed transition-based parsers as long as they satisfy certain conditions. We apply this approach to obtain undirected variants of three different parsers (the Planar, 2-Planar, and Covington algorithms) and perform experiments on several data sets from the CoNLL-X shared tasks and on the Wall Street Journal portion of the Penn Treebank, showing that our approach is successful in reducing error propagation and produces improvements in parsing accuracy in most of the cases and achieving results competitive with state-of-the-art transition-based parsers.Xunta de Galicia | Ref. CN2012/008Xunta de Galicia | Ref. CN2012/317Xunta de Galicia | Ref. CN2012/319Ministerio de Ciencia e Innovación | Ref. TIN2010-18552-C03-01Ministerio de Ciencia e Innovación | Ref. TIN2010-18552-C03-0

    Ampliar y descubrir contenidos en Inteligencia Artificial mediante el uso de agregadores de enlaces, votaciones y karma

    Get PDF
    El tener que acotar y delimitar la serie de contenidos que abordar en un curso es una dificultad a la que debemos enfrentarnos a la hora de planificar la docencia de una materia. Esto es especialmente problemático en materias ya de por sí extensas como las relacionadas con la Inteligencia Artificial y supone tener que omitir partes importantes, tanto teóricas como aplicadas, de un campo en constante evolución. En este trabajo presentamos un intento de mitigar estos problemas haciendo uso de agregadores sociales de enlaces, como digg, reddit o meneane, que permiten que los alumnos exploren por sí mismos, descubran y compartan sus impresiones respecto a aspectos de la materia Inteligencia Artificial que en las clases presenciales no se pueden tratar en profundidad. En nuestro caso hemos implantado nuestro propio agregador de enlaces proponiendo una actividad complementaria cuya evaluación ha sacado provecho de los mecanismos de reputación, o karma, en los que este tipo de herramientas sociales basan su funcionamiento

    Cross-repository aggregation of educational resources

    Get PDF
    The proliferation of educational resource repositories promoted the development of aggregators to facilitate interoperability, that is, a unified access that would allow users to fetch a given resource independently of its origin. The CROERA system is a repository aggregator that provides access to educational resources independently of the classification taxonomy utilized in the hosting repository. For that, an automated classification algorithm is trained using the information extracted from the metadata of a collection of educational resources hosted in different repositories, which in turn depends on the classification taxonomy used in each case. Then, every resource will be automatically classified on demand independently of the original classification scheme. As a consequence, resources can be retrieved independently of the original taxonomy utilized using any taxonomy supported by the aggregator, and exploratory searches can be made without a previous taxonomy mapping. This approach overcomes one of the recurring problems in taxonomy mapping, namely the one-to-none matching situation. To evaluate the performance of this proposal two methods were applied. Resource classification in categories existing in all repositories was automatically evaluated, obtaining maximum performance values of 84% (F1 score), 87.8% (area under the receiver operator characteristic curve), 86% (area under the precision-recall curve) and 75.1% (Cohen's κ). In the case of resources not belonging to one of the common categories, human inspection was used as a reference to compute classification performance. In this case, maximum performance values obtained were respectively 69.8%, 73.8%, 75% and 54.3%. These results demonstrate the potential of this approach as a tool to facilitate resource classification, for example to provide a preliminary classification that would require just minor corrections from human classifiers.Xunta de Galicia | Ref. R2014/034 (RedPlir)Xunta de Galicia | Ref. R2014/029 (TELGalicia

    Correción regional de errores con coste mínimo

    No full text
    Tesis doctoral en Informática realizada por Víctor Manuel Darriba Bilbao bajo la dirección del doctor Manuel Vilares Ferro de la Universidad de A Coruña. El acto de defensa de la tesis tuvo lugar el 25 de octubre de 2002 ante el tribunal formado por los doctores Roberto Moreno Díaz (Univ. de Las Palmas de Gran Canaria), José Luis Freire Nistal (Univ. de A Coruña), José Gabriel Pereira Lopes (Universidade Nova de Lisboa), Bernard André Dion (Esterel Technologies), Fernando Martín Rubio (Univ. de Murcia). La calificación obtenida fue Sobresaliente Cum Laude por unanimidad.PhD Thesis in Computer Science written by Víctor Manuel Darriba Bilbao under the supervision of Dr. Manuel Vilares from Universidad de A Coruña (Spain). The author was examined in October 25th, 2002 by the commitee formed by Dr. Roberto Moreno Díaz (Universidad de Las Palmas de Gran Canaria), José Luis Freire Nistal (Universidad de A Coruña), José Gabriel Pereira Lopes (Universidade Nova de Lisboa), Bernard André Dion (Esterel Technologies), Fernando Martín Rubio (Universidad de Murcia). The grade obtained was Sobresaliente Cum Laude

    Metodología para la construcción de córpora textuales estructurados basados en XML

    Get PDF
    En este trabajo analizamos los aspectos más relevantes para definir una metodología que posibilite la construcción de córpora textuales estructurados basados en XML.In this article we discuss the most important issues in the definition of a methodology for the development of structured text corpora based on XML.Parcialmente financiado por el Ministerio de Educación y Ciencia (TIN2004-07246-C03-01), Xunta de Galicia (PGIDIT05PXIC30501PN) y Universidade de Vigo

    Multiword expressions processing in Galician using Deep Learning

    Get PDF
    El tratamiento de Expresiones Multipalabra es todavía una tarea pendiente en el Procesamiento del Lenguaje Natural. En este trabajo pretendemos determinar experimentalmente la utilidad de los modelos de Aprendizaje Automático para el procesamiento de Expresiones Multipalabra en gallego. Para ello usamos CORGA, un corpus con 40 millones de palabras, con el cual entrenamos modelos transformer de Aprendizaje Profundo, y comparamos su rendimiento con el de modelos más tradicionales de campo aleatorio condicional.Treatment of Multiword Expressions is still a pending task in Natural Language Processing. In this work, we want to experimentally determine the usefulness of Machine Learning models for Multiword Expression processing in Galician. With that aim, we use CORGA, a 40 million word corpus, with which we train Deep Learning-based transformers, comparing their performances with those of more traditional conditional random fields.Este trabajo ha sido parcialmente financiado por la Xunta de Galicia, a través del Convenio de colaboración plurianual entre el Centro Ramón Piñeiro para la Investigación en Humanidades y la Universidad de Vigo, y la Ayuda para la Consolidación y Estructuración de Unidades de Investigación Competitivas ED431C 2018/50, y por el Ministerio de Economía, Industria y Competitividad a través del proyecto TIN2017-85160-C2-2-R

    Análisis sintáctico de sentencias incompletas

    No full text
    Describimos un algoritmo de análisis sintáctico para gramáticas independientes del contexto (GICs), capaz de procesar entradas incompletas, incluyendo secuencias desconocidas de longitud igualmente desconocida. El analizador descrito genera como salida un bosque compartido finito que compila todos los análisis posibles de la entrada, a menudo infinitos en número. En contraste con trabajos anteriores, nuestra propuesta hace uso de técnicas avanzadas de programación dinámica que se traducen en una notable mejora del rendimiento computacional del sistema. Introducimos una construcción deductiva basada en el formalismo conocido como parsing schemata, lo que nos permite simplificar considerablemente la fase descriptiva.We describe a context-free parsing algorithm to deal with ill-formed input, including also unknown parts of unknown length. The parser produces a finite shared-forest compiling all parses, often infinite in number. In contrast to previous works, our proposal derives profite from a finer dynamic programming construction, resulting on an improved computational behavior. We also introduce a deductive construction based on the parsing schemata formalism, which is on the advantage of simplification on the description task.Este trabajo ha sido parcialmente financiado por el Gobierno español mediante los proyectos TIC2000-0370-C02-01 y HP2001-0044, y por el Gobierno autonómico de Galicia a través de los proyectos PGIDT01PXI10506PN, PGIDIT02PXIB30501PR y PGIDIT02SIN01E

    Optimización en el reconocimiento de patrones

    No full text
    El objetivo del trabajo es reducir los costes de evaluación de interrogaciones en sistemas de recuperación de información que integren reconocimiento de patrones. En este contexto estudiamos la aplicación de estrategias de programación dinámica en situaciones de compartición de estructuras entre patrones. Nuestra intención es mejorar el rendimiento computacional en esta clase de estrategias, a menudo valoradas por la calidad de los resultados, pero desechadas por su complejidad en favor de enfoques menos sofisticados basados en la simple indexación de términos. El trabajo incluye los primeros resultados experimentales que, aunque preliminares, sugieren su viabilidad.Trabajo parcialmente financiado por los proyectos PGIDT99XI10502B de la Xunta de Galicia y 1FD97-0047-C04-02 Unión Europea
    corecore